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Abstract 

A  software  architecture  is  presented,  which  introduces  several  agents  which  focus  on  different  as¬ 
pects  of  path  planning  for  multiple  autonomous  unmanned  aerial  vehicles  (UAV’s)  that  are  searching 
an  uncertain  and  threatening  environment  for  targets.  One  agent  models  threats  in  the  environment. 
Another  develops  a  model  of  the  environment  that  allows  targets  to  be  defined  by  individual  probability 
distribution.  Lastly,  an  agent  is  presented  that  utilizes  the  information  from  the  other  agents  to  generate 
a  near  optimal  path  plan  using  a  Dynamic  Programming  algorithm. 


1  Introduction 

Unmanned  aerial  vehicles  (UAV’s),  such  as  the  Predator  [1],  have  been  receiving  an  increasing  amount  of 
attention  recently.  The  versatility  of  these  craft,  and  their  relative  inexpensiveness,  make  them  appealing 
for  use  in  many  areas  where  the  use  of  manned  aircraft  would  be  too  dangerous  to  the  pilot  or  too  costly. 
However,  the  Predator,  while  unmanned,  still  requires  a  substantial  amount  of  manpower  to  operate,  as  it 
is  still  piloted,  albeit  remotely  from  a  ground  station.  It  is  desired,  then,  that  future  UAV’s,  similar  to  the 
Boeing  X-45A  [2],  have  higher  levels  of  autonomy. 

However,  directing  autonomous  agents  such  as  these  to  behave  in  an  “intelligent”  manner  constitutes 
a  very  interesting  and  challenging  problem,  especially  in  cases  where  there  exist  many  constraints  on  the 
control  of  the  agents.  This  is  particularly  true  in  the  case  of  multiple  autonomous  unmanned  aerial  vehicles 
searching  for  targets  in  an  uncertain  environment.  While  the  problem  is  a  type  of  “search”  problem,  there 
are  many  factors  present  that  set  this  problem  apart  from  many  of  the  classical  search  problems,  such  as 
those  discussed  in  [6].  Even  some  standard  methods  of  search  become  less  desirable  in  the  presence  of  a 
priori  information  and  multiple  vehicles,  where  the  concept  of  cooperation  among  the  vehicles  is  brought  to 
the  forefront. 

The  great  bulk  of  the  work  in  cooperative  path  planning  and  decision  is  in  the  area  of  “point-to-point” 
paths,  where  the  vehicles  have  a  given  destination  or  set  of  destinations  to  travel  to,  as  in  [7]  and  [8].  While 
this  area  has  produced  many  good  results,  the  methods  utilized  are  not  really  suitable  to  search,  where  a 
vehicle’s  final  destination  is  secondary  to  the  exact  route  it  takes  (and  the  targets  discovered  thereon.)  The 
first  step  in  the  cooperative  search  path  planning  decision  process  is  to  model  the  environment  in  a  suitable 
manner  for  autonomous  route  planning.  Some  of  these  methods  have  been  proposed  recently  in  [5] ,  [9] ,  and 
in  previous  work  on  this  topic  that  has  been  evolving  in  several  other  papers  by  the  authors  of  this  work: 
[3]  and  [4].  However,  this  paper  differs  from  the  previous  work  by  creating  an  formal  software  architecture. 
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Figure  I;  l1  he  architecture  of  the  CPFS  Agent. 


Within  this  architecture  there  are  now  additional  elements  which  have  been  added  to  the  previous  stochastic 
modeling  and  decision  process  formulation  for  the  cooperative  UAV  search  problem.  This  includes  removing 
the  assumption  that  all  of  the  targets1  distributions  are  completely  independent  and  identically  distributed. 
Also,  threats  are  explicitly  represented  in  the  model  of  the  environment  and  in  the  planning  algorithm. 
Additionally,  a  more  refined  definition  of  gain  is  proposed. 

Figure  1  shows  the  internal  architecture  of  the  CPFS  module.  The  boxes  show  the  boundaries  of  structures 
within  the  CFPS  module,  with  the  arrows  showing  the  interfaces  and  the  flow  of  information  involved-  The 
blue  box  shows  the  boundaries  of  the  CFPS  module.  The  boxes  with  green  backgrounds  show  information 
bases,  which  passively  store  information.  The  white  boxes  show  the  agents,  who  are  the  actors  in  the  module, 
and  are  described  in  the  following  sections.  These  agents  utilize  the  information  from  the  information  bases 
and  coordinate  with  one  another  to  perform  the  functions  of  the  module.  The  Situation  Model  that  exists 
outside  the  module  acts  as  a  central  information  base  and  as  the  gateway  to  the  human  users,  the  simulator, 
and  the  other  automated  agents  and  reasoners  that  handle  higher  level  tasks. 

Section  2  is  a  discussion  of  the  Search  Map  Handler,  an  agent  which  handles  information  about,  the 
probability  of  discovering  targets  in  various  regions  of  the  environment.  The  Threat  Map  Handler,  the  agent 
that  deals  with  threats,  i.e.  the  danger  posed  to  a  vehicle  travelling  in  various  parts  of  the  environment,  is 
discussed  in  section  3.  The  next  section,  T  is  concerned  with  the  keystone  agent;  the  Planning  Agent,  which 
takes  the  information  from  the  other  agents  and  uses  it  to  produce  the  best  path  available  to  the  vehicle. 
Section  -5  gives  some  very  basic  simulation  results,  and  Sec- 1 ion  6  gives  conclusions  and  continuing  work. 


2  The  Search  Map  Handler 

For  this  analysis,  any  object  for  which  the  vehicles  are  searching  is  called  a  target,  regardless  of  what  other 
qualities  the  target  may  have  (threat  value,  priority,  etc).  A  target  which  is  suspected  to  exist  in  the 
environment,  but  has  not-  been  detected  is  called  “suspected.”  A  target  that  a  vehicle  detects,  but  does 
not  locate  with  certainty  is  called  “detected.".  It.  bi  assumed  that  with  each  detected  target  there  exists 
an  uncertainty  region  of  variable  size  that,  the  target  is  known  to  he  within.  An  uncertainty  region  which 
covers  a  lot  of  area  means  that  the  target’s  position  es  not  known  with  much  certainty.  It  is  also  assumed 
that  there  is  some  threshold,  such  that  if  this  area  is  sufficiently  small,  the  target  will  be  considered  as  fully 
“found,'’  and  the  vehicles  will  no  longer  be  trying  to  gain  more  certainty  about  its  location,  and  it  ceases  to 
be  a  candidate  target  for  the  search.  Of  course,  if  the  target  is  destroyed,  then  it  would  also  cease  to  be  a 
candidate  for  the  search. 
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2.1  Search  Gain  Defined 

The  Search  Map  Handler  should  produce  a  value  that  represents  how  valuable  certain  areas  of  the  envi¬ 
ronment  are  to  search.  This  value  is  termed  “search  gain.”  The  search  gain  function  should  maximize  the 
number  of  real  targets  (as  opposed  to  false  targets)  detected  or  found.  However,  as  the  location  and  even  the 
existence  of  the  targets  are  based  on  random  values,  the  expected  number  of  targets  detected  will  have  to 
be  maximized  instead.  In  order  to  do  this,  the  vehicle  will  have  to  choose  which  cells  of  the  map  to  search. 
To  find  the  maximum  number  of  targets,  a  vehicle  will  want  to  search  cells  that  have  the  highest  number  of 
expected  targets  in  them.  This  expectation  is  based  on  the  number  of  possible  targets  in  the  cell  and  three 
other  factors.  For  a  target  i,  let 

•  Fx  be  the  event  that  target  i  is  detected  in  cell  x. 

•  Tx  be  the  event  that  target  i  actually  is  in  cell  x. 

•  E*  be  the  event  that  target  i  exists  (is  a  real  target) . 

In  order  for  a  target  to  be  within  a  cell  (event  Txy)  the  target  must  first  exist  (event  El),  thus  Tj  C  E*. 

Also,  since  if  a  target  exists,  it  must  be  somewhere, 

X 

The  expected  probability  of  discovering  target  i  in  cell  x  is  determined  by  the  probability  of  events  F*y 
Tj,  and  E{  all  occurring  (i.e.  P(FX  n  T'x  n  E1)).  But,  since  T*  C  E\  P{F'X  n7jn^)  =  P(F*  n  7*),  which 
can  be  written  as 

P(FZnT*)  =  P(F*\Ti)P(Ti)  (2) 

by  conditional  probability.  Since  event  El  completely  includes  event  T!j, 

P(TX)  =  P(Ti  n  &)  =  P(Ti\Ei)P(Ei),  (3) 

Putting  these  two  equations  together, 

P(Fi  nrx)  =  P(Fi\Tx)P(Ti\Ei)P(Eiy  (4) 

P(FX\TX)  is  the  probability  that  a  target  is  detected  by  a  sensor  search  of  cell  x  given  that  the  target 
is  in  cell  x.  This  is  defined  as  the  sensor  efficiency,  p,  which  for  this  work  is  assumed  to  be  the  same,  and 
constant,  for  all  cells  and  targets  and  to  be  available  a  priori  (e.g.  as  a  function  of  the  sensor  type.)  Also, 
P(Ex)y  the  probability  that  target  i  is  a  real  target,  is  labeled  as  Lastly,  P(T:j|Ei)  is  the  probability 
that,  given  that  a  target  i  is  a  real  target,  it  is  in  a  particular  cell  x.  This  can  be  found  from  the  given 
distribution  (uniform  over  the  uncertainty  area  by  default)  of  the  target  or  from  a  priori  information.  This 
is  not  assumed  to  be  the  same  for  every  target. 

Let  wx  be  a  random  variable  such  that 

-i  _  /  1  :  if  a™1  E*  all  occur. 

*  “  \  0  :  otherwise.  w 

This  means  that  wx  =  1  if  target  %  is  detected  in  cell  x  and  is  0  otherwise.  Define  the  set  of  all  possible 
targets  in  a  cell  x  as  Gx,  Let  a  be  a  function  that  takes  as  an  argument  a  cell  (or  a  collection  of  cells),  and 
returns  the  search  gain  for  searching  that  cell  (or  the  sum  of  the  search  gains  for  searching  each  individual 
cell  in  that  collection  of  cells.)  The  search  gain  is  then  calculated  by  taking  the  expectation  for  finding 
targets  in  a  cell  x  over  as  shown  here: 

a(x)  =  E{#  Targets  detected  in  x} 

=  £eR} 

t€G* 

t€G* 

=  £  {pCP(Tx |F)}.  (6) 

i€Gx 
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2.2  The  Search  Map 

It  is  possible  to  efficiently  store  the  values  necessary  to  calculate  the  search  gain  from  Eq.  6.  First,  the  map 
is  defined  as  a  bounded  rectangular  area  divided  into  discrete,  equal-area  cells.  For  each  cell  of  the  map 
there  exists  an  array  (or  vector)  that  stores  several  values.  The  first  value  in  the  array  is  the  product  of  the 
vehicle’s  search:  a  value  that  represents  the  reduction  in  uncertainty  about  what  is  in  the  cell  because  of  the 
action  of  the  vehicles.  Let  i/x,  the  number  of  times  a  cell  has  been  searched,  represent  this  value  for  cell  x. 

The  second  and  subsequent  values  in  the  array  are  calculated  based  on  information  about  particular 
targets  that  have  some  probability  of  being  located  in  the  cell.  For  each  target,  this  can  be  found  by  taking 
the  area  in  which  that  target  may  be  found  (e.g.  the  uncertainty  region),  and  determining  which  of  the  map 
cells  are  within  this  area.  Then,  a  value  known  as  the  relative  probability  value  (C* )  for  target  i  in  cell  x  is 
calculated,  using  the  distribution  that  is  given  for  that  target.  If  no  information  is  available  a  priori  to  create 
this  distribution,  it  is  assumed  to  be  uniformly  distributed  over  the  whole  search  area.  Each  cell  of  the  map 
holds  an  array  (of  the  same  size  as  the  number  of  elements  of  Gx)  of  these  Cxx  values  for  each  target.  Thus 
the  total  size  of  the  array  for  each  cell  x  is  the  number  of  targets  that  may  be  in  the  cell  plus  one  (for  vx). 

Note  that  when  the  probability  value  of  one  cell  for  a  specific  target  changes,  the  value  of  every  other 
cell  where  that  target  is  possible  would  also  change.  Updating  several  cells  any  time  a  single  one  changes  is 
not  very  efficient,  especially  when  there  are  a  large  number  of  cells.  Thus  the  relative  probability  values  are 
defined  such  that  only  the  value  of  the  cell  that  is  searched  is  changed;  the  others  do  not  change. 

How  these  relative  probability  values  can  then  be  used  to  calculate  the  search  gain  is  shown.  Let  P j  be 
the  probability  of  target  i  being  in  cell  x  (given  that  the  target  exists,)  and  let  Nx  be  the  number  of  cells  in 
the  uncertainty  area  of  the  target.  So, 


Pi  =  P(TX\E ')  = 


AxCi 


Now,  Ax  is  a  constant  value  for  every  cell,  so 


(7) 


w 

II 

V 

(8) 

Then  letting 

II 

M 

S3 

(9) 

equation  7  simplifies  to 

pi  = 

(10) 

Since  V*  and  are  the  same  for  every  cell  for  a  particular  i,  these  values  do  not  need  to  be  stored  in  the 
arrays  for  the  cells  of  the  map.  Instead,  these  values  can  be  stored  as  a  list  which  records,  for  each  target 
i,  the  C  value  and  V'  value  for  that  target.  And  so,  (using  Eq.  6  and  Eq.  10)  the  search  gain  can  now  be 
written  as 

»(*)  =  £  (pC^cSh,  (ii) 

where  each  of  these  values  is  available  as  a  priori  information  (p)  or  from  the  search  map  (£*,  ). 


2.3  Updating  The  Search  Map 

For  each  target  i  €  Gx  and  for  all  x  that  are  searched  by  a  vehicle  at  a  certain  time  step  (i.e.  the  cells  that 
fall  within  the  vehicle’s  sensor’s  footprint,)  there  are  three  (mutually  exclusive)  things  that  can  occur  before 
the  next  time  step.  They  are 

•  Target  not  detected. 

•  Target  detected  but  not  found. 

•  Target  detected  and  fully  found,  or  target  destroyed. 


2.3.1  Target  not  Detected 

Let  F jjj  be  the  event  that  target  i  is  not  detected  on  a  search  of  cell  xi .  Now,  on  a  search  of  the  cell,  either  a 
target  is  detected  or  it  is  not,  so  P(FXX )  +  P(Ftx )  =  1.  This  occurs  regardless  of  whether  the  target  actually 
exists  in  the  cell  or  not,  so  P(FXX  IT^ )  +  P(FXl  \TXl )  =  1.  So, 

p(K1\rxl)  =  i-p(Fii\rxl).  (12) 

Since  =  p, 

WI|r*l)  =  (l-p).  (13) 

Now,  when  a  cell  is  searched  and  target  i  not  detected,  the  map  would  like  to  record  the  updated 
probability  that  target  i  is  in  that  cell  (i.e.  that  it  is  really  still  there.)  Assuming  that  the  probability  of 
existence  of  the  target  does  not  change  due  to  this  sensor  sweep,  the  updated  probability  that  the  target  is 
located  in  the  cell  after  the  search  (given  that  the  target  exists)  is  given  by  conditional  probability  as 


P(rxi\(FiinEi))  = 


pm,  he*)  rejpre, ) 


(14) 


The  fact  that  Tj  C  Ex  means  that  if  7^  is  given  then  event  E x  is  given  as  well.  This  and  Eq.  3  let  Eq.  14 
be  written  as 

r(Ti  i frn  Pinm^p^mPiE*) 

P(TXI\FXI,E)- - - •  (15) 


The  event  whose  probability  is  given  by  P(Pjl|Pi)  occurs  under  two  conditions.  The  first  is  if  the  target 
is  not  located  in  the  cell,  an  event  with  probability  1  -  P{J%i\Ei),  The  second  is  if  the  target  is  in  the  cell, 
but  that  it  was  not  detected,  an  event  with  probability  P(Pj1  |Tj1)P(7^1  IP4).  Since  these  two  events  are 
mutually  exclusive,  the  probabilities  are  additive.  Using  Eq.  13,  this  sum  can  be  written  as 


P(^,|£*)  =  l-P(rxi\Ei)  +  (l-p)P(T'xl\Ei) 

=  i  -  pPinm). 


(16) 


Thus,  Eq.  15  becomes 


p(Txl\nm) 


(lmpjTtm) 

1  -pP(T'x,\E')  ■ 


(17) 


Note  that  this  update  equation  is  valid  only  for  the  cell  x\  referred  to  by  event  F*l,  even  though  the 
probabilities  of  the  target  being  located  in  all  of  the  other  cells  on  the  map  are  also  changed.  Let  x2  be  an 
arbitrary  cell  that  is  not  referred  to  by  event  The  post  priori  (after  event  Fxx)  probability  of  this  cell 
(again,  given  that  target  i  exists)  is  given  by 


PirjF^E*) 


p(fjl\n9)P(n2\Ei) 


(18) 


It  is  assumed  for  the  purposes  of  this  paper  that  the  sensor  will  not  misplace  the  target  if  it  is  detected,  so 
if  the  target  is  in  cell  x2  it  will  not  be  detected  in  cell  x\  and  the  probability  =  1.  Using  this 

and  Eq.  16,  Eq,  18  becomes 


p(ji,\nm) 


p(Tjm) 

i-ppcnmr 


(19) 


The  map  handler  now  wishes  to  record  the  new  (poet  priori)  probability  values  on  the  map.  However, 
the  map  does  not  store  the  probability  (P*x )  values  directly.  Instead,  the  relative  probability  values  {Cxxx ) 
are  stored.  Let  x\  be  the  cell  that  was  searched  and  is  being  updated,  and  x2  be  any  other  arbitrary  cell 
with  Cxx^  ^  0.  To  simplify  the  notation  somewhat,  a  prime  (  '  )  is  used  to  represent  a  post  priori  value, 
(i.e.  Pxn  =  PiT^F^yE*)  for  n  =  {1,2}.)  Now,  the  relative  probability  values  are  defined  so  that 


Pit 


(20) 
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and 


(21) 
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However,  Cj/  =  C*a,  since  only  was  searched.  Using  Eq.  17  and  Eq.  19, 


cl'  rj  (l  -p)Pjx 
PL*'  px*2 


(22) 


Now,  using  equation  20, 

✓ - V  c* 

I'6" 

II 

(23) 

Therefore, 

Cj/  =  (1  -/»)<£,. 

(24) 

So  for  the  case  where  a  cell  is  searched  and  a  target  is  not  detected,  use  Eq.  24  to  update  the  value  of  the 
array  for  cell  x  for  that  target.  No  other  cell  value  is  changed,  but  the  value  of  Vx  must  also  be  updated. 
Since  the  changes  to  the  map  are  made  incrementally  and  one  cell  at  a  time,  this  allows  the  value  of  Vx  to 
be  changed  in  the  same  manner,  so  that  a  running  value  can  be  easily  maintained.  Now, 


Vi  = 


_ 1 _ 

Ci  +  ...  4-  CXx  +  . . .  +  CxNc 


(25) 


It  is  desired  that  the  CXl  term  in  the  denominator  be  updated  with  the  new  value  Cxx  . 
updated  (to  V* f)  by  using 

yt'  _  _ 1 _ 

yr  +  (Ci/  ~  C%xx) 


Thus,  Vx  can  be 
(26) 


2.3.2  Target  Detected 

If  a  target  i  is  detected,  then  remove  all  previous  entries  for  that  target  (i.e.  all  the  C*  values)  from  the 
arrays  of  all  of  the  cells.  Now,  the  Cxx  values  must  be  generated  using  the  new  target  information  based  on 
the  new  distribution  that  is  available  because  the  target  was  detected.  Then  store  this  information  in  the 
proper  arrays  of  the  cells  of  the  map,  making  sure  to  record  the  C  value  for  that  target  also.  Next,  the  Cx 
values  must  be  updated  to  account  for  any  searching  in  the  cells  that  has  already  occurred  previously.  For 
each  cell  in  the  area  of  the  new  distribution  of  that  target,  apply  equations  24  to  the  map  values  a  number 
of  times  equal  to  the  value  of  ux  for  that  cell.  The  value  of  Vx  is  also  calculated  and  stored. 

2.3.3  Target  Pound  /  Destroyed 

This  requires  the  removal  of  the  target  from  the  map  entirely,  since  the  target  is  no  longer  a  candidate  for 
search,  either  because  it  does  not  exist  anymore,  or  because  its  location  is  known  with  enough  certainty.  In 
this  case,  remove  it  from  the  array  of  possible  targets  for  every  cell.  Its  value  will  no  longer  be  used  when 
calculating  the  number  of  (uncertain)  targets  expected  to  be  found  in  a  search  of  any  particular  cell. 

3  The  Threat  Map  Handler 

This  section  describes  how  threats  are  modeled  in  a  form  that  can  be  incorporated  efficiently  by  the  Threat 
Map  Handler  and  stored  by  the  Threat  Map  Information  Base. 

The  level  of  danger  to  a  flying  UAV  may  depend  on  several  factors,  which  include  any  one  or  several 
threats  from  opposing  forces,  including  SAM  sites  or  random  ground  fire,  or  the  terrain  itself  in  the  form  of 
hills,  mountains,  etc.  Since  several  of  these  factors  have  a  direct  relationship  to  altitude,  this  must  also  be 
explicitly  accounted  for. 


to 


Figure  2:  A  simplified  threat  profile. 


Let  alt  he  the  altitude  of  the  UAV.  Let  a4)  be  the  altitude  of  the  ground  or  some  other  terrain  feature 
into  which  the  UAV  could  crash  if  they  are  at  the  same  altitude.  Thus,  the  probability  of  being  destroyed 
by  the  terrain  (DtJ)  is 


D 


0  ayi  >  an 

1  Uu  S.  <}-q 


(27) 


This  is  true  anywhere  in  the  environment,  but  since  afJ  can  vary  in  different  regions  of  the  environment,  a 
map  must  record  the  value  for  atJ  in  di  He  rent  regions  of  the  environment.,  A  two  dimensional  map,  like  the 
search  map,  will  work.  Also,  like  the  search  map,  this  terrain  map  is  divided  into  a  grid  of  equal  sized  cells, 
the  si?,e  of  which  can  be  adjusted  from  run  to  run  and  does  not  necessarily  have  to  be  correlated  with  the 
size  of  the  cells  for  the  search  map. 

There  is  assumed  to  be  an  altitude  (a/)  above  which  the  vehicle  is  safe  from  ground  fire,  but  below  which 
it  is  threatened.  Let  Dj  be  the  probability  of  being  destroyed  by  ground  fire.  This  value  is  given  as: 


f  0  au  >  a  f 

^  F  otherwise 


(28) 


The  value  of  F  c  [0,  1]  is  assumed  for  this  paper  to  be  a  constant  value  over  the  entire  environment. 

Let  rla  be  the  distance  from  the  point  directly  below  the  UAV  (at  0  altitude)  to  threat  L  This  is  a  function 
of  the  x  and  y  locations  of  the  vehicle  (xu,yu),  and  is  calculated  from  the  threat  location  (xl,yr).  Let  r'(au) 
be  the  minimum  safe  distance  from  the  threat,  for  a  vehicle  at  altitude  «u,  which  must  be  calculated  using 
the  threat  range  profile  for  threat  i.  The  threat  range  profile  is  assumed  to  be  known  a  priori  and  to  be  look 
something  like  that  shown  in  Fig.  2, 

Fig.  2  si  tows  several  key  values  of  the  profile.  The  distance  i\)  is  the  largest,  distance  at  which  the  SAM 
can  hit  a  vehicle  at  0  altitude.  The  distance  is  the  farthest  away  the  SAM  site  can  fire.  The  altitude 

a  max  is  the  highest  altitude  that,  a  vehicle  can  be  fired  upon  by  the  SAM  site,  The  altitude  aUtp  is  the 


n 


value  above  which  altitude  does  not  change  the  threat  range  (until  amax  is  reached).  It  is  assumed  that  the 
minimum  altitude  is  0,  and  that  the  minimum  safe  distance  boundary  can  be  given  as  a  line.  Thus, 


i 


0 

^mai 

a  “  (rmai  *”0)  +  **0 


>  0>rnax 
®top  —  <*u  ^  &max 
0  fly  ^  flfop 


Let  D\  be  the  probability  of  being  destroyed  by  threat  i.  This  is  given  by 

Di=(  0  >  ri(au) 

*  |  7n  otherwise 


(29) 


(30) 


The  value  of  T*  €  [0, 1]  for  threat  i  is  assumed  for  this  paper  to  be  a  constant  for  over  the  whole  area  covered 
by  that. 

The  threat  map  handler  records  the  location  and  type  of  all  SAM’s.  Then,  for  a  given  UAV’s  location, 
it  can  calculate  the  threat  level  from  a  SAM  to  the  UAV  using  the  information  about  the  profile  for  that 
type  of  SAM.  Then,  to  determine  the  toted  threat  level  to  a  UAV,  the  map  handler  calculates  the  values  for 
Dgi  Dfy  and  D\  for  for  all  threats  i  =  1, . . . ,  n  that  threaten  a  vehicle  at  a  given  (xu,  yu,  au)  location  using 
Eq.  27,  Eq.  28,  and  Eq.  30.  The  total  threat  ( d )  facing  a  vehicle  at  that  location  in  the  environment  can 
then  be  found  by: 

d  =  1  -  [(1  -  Dg){  1  -  Df)(  1  -  D}) ...  (1  -  A")]  (31) 


4  The  Planning  Agent 

The  Planning  Agent  takes  the  information  from  the  other  agents  and  uses  it  to  decide  the  path  that  produces 
the  most  expected  targets  detected  over  (ideally)  the  entire  lifetime  of  the  vehicle,  which  is  N  time  steps,  or 
(non-ideally)  a  smaller,  more  feasible,  planning  horizon,  which  is  defined  as  q  (q  <  N)  time  steps. 

In  order  for  the  vehicles  to  plan  their  trajectories,  they  need  to  know  information  about  the  state  of  the 
environment.  The  true  state  of  the  environment,  after  discretization,  is  represented  by:  (1)  an  information 
base  like  those  maintained  by  the  agents,  except  that  it  is  updated  by  every  vehicle  at  every  time  step 
(producing  an  ideal  information  base);  and  (2)  by  the  locations  and  headings  of  the  vehicles.  This  true  state 
is  denoted  by  xj.  The  state,  as  perceived  by  an  individual  vehicle,  is  denoted  as  x*.  Under  ideal  conditions, 
every  vehicle  will  perceive  the  state  as  being  the  tnje  state  (xj  —  x/t,)  but  in  the  non-ideal  case,  each  vehicle 
may  have  a  different  perception  based  on  the  information  available  to  it. 

The  choice  of  which  path  to  travel  for  a  vehicle  at  time  k  is  the  decision,  or  control  u/t,  where  uk  €  U, 
where  U  is  a  set  that  contains  the  choices  that  the  vehicle  can  take  (for  example:  turn  15°  left,  go  straight, 
or  turn  15°  right.)  Let  J *  be  defined  as  the  “cost-to-go”  function  from  time  step  k  to  JV.  Then,  the  best 
path  at  time  step  k  can  be  found  from  utilizing  the  DP  recursion  over  the  planning  horizon  (ideal)  k  to  N 
or  (non-ideal)  k  to  k  +  q: 


Jk(xk)  =  max({p(xfc,u*)  +  Jk+i(f(xk,uk)})  (32) 

Ufc€t/ 

where  g(x *,11*)  is  the  one-step  gain  function.  Note  that  if  the  expectation  of  the  stochastic  elements  is 
encapsulated  and  contained  within  the  gain  function  then  this  is  a  deterministic  equation,  and  can  be  solved 
using  a  shortest-pat h  algorithm. 

To  ensure  cooperation,  the  other  vehicles  are  modeled  as  stochastic  elements,  where  a  random  quantity 
Wk  is  used  to  represent  the  loss  in  search  gain  actually  received  by  the  vehicle  compared  to  what  was 
expected  when  the  decision  was  made  because  of  interference  by  another  vehicle.  The  vehicles  can  then  use 
the  expected  result  of  this  Wk  to  plan  where  to  go  to  avoid  undue  interference. 

The  one-step  gain  at  each  step  is  then  the  expected  search  gain  one  would  get  if  no  interference  were 
present  minus  the  expected  amount  of  search  gain  one  would  lose  from  another  vehicle  interfering, 

9(xk,uk)  =  E{a  -  (33) 
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This  one-step  gain  is  then  used  in  the  DP  recursion  (Equation  (32))  to  determine  the  expected  optimal  path. 

The  amount  of  interference  a  vehicle  expects  is  a  function  of  p,  a,  and  the  probability  of  another  vehicle 
interfering,  which  is  denoted  as  4*.  So 

E{wk}  —  p  a'k  (34) 

Adding  in  the  interference  calculations,  we  have 

g  =  E{a  —  w*}  =  a  —  po^t  (35) 

Where  a  is  the  search  gain  (with  no  interference,)  p  is  the  sensor  efficiency,  and  4*  is  the  probability  of 
interference.  This  value  4*  €  [0, 1]  provides  penalties  to  the  vehicles  for  searching  areas  that  have  some 
probability  of  being  searched  by  another  vehicle  first.  In  effect,  it  forces  the  vehicles  to  spread  apart.  This 
is  covered  in  much  more  detail  in  [3]  and  [4]. 

4.1  Adding  Threats  into  the  Planning  algorithm 

Threats  can  be  incorporated  into  the  decision  model  by  recognizing  what  happens  to  a  vehicle  if  it  enters  a 
threatened  area:  1.)  nothing  happens  (i.e.  the  vehicle  travels  as  normal)  or  2.)  the  vehicle  is  shot  down,  or 
crashes,  and  is  destroyed.  There  is  assumed,  for  the  moment,  to  be  no  intermediate  states  (i.e.  no  damaged 
state.)  Having  a  vehicle  destroyed  means  that  the  vehicle  is  no  longer  available  to  search  for  the  remainder 
of  this  mission  (or  any  future  mission,  either.)  So,  there  is  now  a  new  state  that  a  vehicle  can  enter  at  any 
time  during  the  mission:  dead.  If  a  vehicle  is  destroyed  at  time  k ,  it  will  move  to  this  state  for  time  step 
k  +  1  and  its  one-step  gain  (gic+x)  for  this  state  and  for  all  future  states  will  be  0  (since  the  vehicle  cannot 
leave  the  dead  state.)  However,  any  gain  it  had  previously  received  it  will  not  lose. 

Additionally,  if  the  vehicle  is  destroyed,  there  is  also  a  cost  for  not  being  available  to  use  in  future 
missions.  So,  J^r,  the  final  reward  at  the  end  of  the  mission,  now  has  two  different  values.  If  a  vehicle  is 
destroyed  during  the  mission,  Jn  should  be  0,  as  this  represents  that  the  vehicle  cannot  provide  any  more 
gain  in  any  future  missions.  If  a  vehicle  is  not  shot  down,  Jjy  should  be  X ,  where  X  is  a  positive  reward  for 
not  being  destroyed,  and  represents  the  value  of  having  the  vehicle  available  for  future  missions. 

4.2  Utilizing  the  Information  from  the  Search  and  Threat  Map  Handlers 

Continuing  with  this  analysis,  we  can  now  write  the  expected  cost-to-go  (or  reward)  for  taking  an  arbitrary 
path  by  expanding  the  recursion  Eq.  32: 

E{  Jk]  =  (1  -  dk)[<Tk  -  p0k$k  + 

(1  -  dfc+i)[afc+ 1  -  p^k+i^k+i  +  •  •  •  + 

(1  —  dk+q)[<7k+q  ~  pO’k+q'to k+q  +  A]]].  (36) 

Renaming  (1  —  di)  as  s*  for  arbitrary  i,  factoring  out  a,  and  letting  /»  =  (1  —  p4s),  Eq.  36  can  be  written 
as 

E  {Jk}  ~  Sl  \akIk  +  Sk+l[<Tk+lh+l  +  •  •  •  +  Sk+q[<Tk+qIk+q  +  A']]]  (37) 

which  then  can  be  decomposed  into  succinct  time  steps  by  letting 

i 

<5i  =  n  8i  (38) 


and  then  noting  that  now 

E  {Jk}  =  [<rkf>kh]  +  [a*+j<S*+i/*+i]  +  . . .  +  [<7k+q6k+qlk+q]  +  8k+qX  (39) 

So  all  the  terms  for  a  certain  time  step  are  collected  together.  Since  the  one-step  gain  at  a  particular  time 
step  depends  only  on  the  current  and  previous  time  steps,  these  can  be  converted  to  a  cost  (distance)  and 
can  be  solved  as  a  shortest-path  problem  like  those  found  in  [3]  and  [4]. 
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Figure  M:  Vehicles  Searching  a  Region  with  Threats. 


5  Simulation 

Results  for  this  work  are  obtained  by  using  the  Boeing  Open  Experimental  Flat  form  (OEP). 

Figure  3  si  lows  a  picture  of  a  trial  being  conducted  in  the  OFF.  Two  teams  of  vehicles  are  si  town,  one 
above  the  -1  degree  latitude  line,  and  one  below.  The  threats,  two  long  range  SAMs  and  two  medium  range 
SAMs  are  shown  on  the  right,  with  their  accompanying  threat  range  rings.  These  rings  show-  the  minimum 
safe  distance  at  5000ft  altitude. 

Additional  simulation  studios  are  underway  in  order  to  explore  the  vehicle's  behavior  and  the  efficacy  of 
making  tradeoffs  in  the  values  used  in  the  algorithm. 

6  Conclusions 

The  architecture  of  the  Cooperative  Search  Path  Planner  allows  foe  us  on  particular  aspects  of  the  cooperative 
search  path  planning  problem  to  be  refined  in  an  efficient  manner.  Advances  in  the  sensor  model  and  the 
way  that  information  about,  targets  and  uncertainty  are  given  and  stored  arc  covered  by  the  Search  Map 
Handler,  The  level  of  risk  posed  to  a  vehicle  by  various  threats  in  the  environment  is  maintained  by  the 
Threat  Map  Handler.  Then,  the  Planning  Agent  utilizes  this  information  provided  by  the  other  agents  in  a 
Dynamic  Programming  algorithm  to  find  near  optimal  solutions. 

However,  there  are  still  many  avenues  to  pursue  to  refine  this  work  further.  The  ability  to  include  a 
variable  value  of  p  in  the  formulation  allows  for  multiple  types  of  sensors  to  be  incorporated  into  the  model. 
Additionally,  finding  ways  of  including  the  effect  of  removing  the  assumption  that  Pf/^  |Tjb )  =  1  allows 
for  a  more  robust  sensor  model.  Creating  more  complex  threat  environments  is  also  possible  by  allowing 
F  and  T1  to  vary.  Also,  of  particular  interest  is  the  value  'F,  which  is  currently  implemented  by  reducing 
the  gain  seen  by  a  vehicle,  but  without  taking  the  threats  into  account.  But,  with  threats,  there  can  be  an 
additional  component  that  takes  into  account  the  fact  that  other  vehicles  may  be  shot  down  before  they  can 
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interfere.  This  would  lessen  the  amount  of  penalty  to  the  search  gain  which  forces  the  vehicles  away  from 
one  another  when  searching  critical,  but  dangerous,  areas.  Additionally,  the  presence  of  targets  that  may 
be  detected  and  require  another  vehicle  to  take  a  second  look  could  eliminate  the  effect  of  the  interference 
in  certain  cases. 
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